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Selection of biopolymer with some correct initial 
properties (step 01 ) ^^^^^ 



Initial design 

a) Identify positions where substitution is 
acceptable and choose substitution s to explore in 
the sequence-activity mapping, (step 02) 

b) Design an initial small set of variants using 
experimental design methods, (step 03) 



Modify initial 
selection 
parameters based 
on performance 
(Step 09) 



Optionally add new 
substitutions from step 02 for 
inclusion in the new variant set 



Synthesize and test the 
variant set for function(s) of 
interest (step 04) 



End-point 
reached 



Propose a new variant set 
based on the model(s). 
(step 07) 



Iterate 



Derive sequence-function relationships: 

a) Use different exploiting algorithms for 
modeling sequence-function relationships, 
(step 05) 

b) Combine results from different sequence- 
function models, (step 06) 



Select the best variant(s) (step 08). 
Use sequences and activities of 
these variants to adjust parameters 
used for substitution selection (step 
09) and sequence-function 
modeling, (step 10) 



Modify methods for combining 

different sequence-function 
models based on performance 
(Step 10) 



Figure 2 
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Selection of antibody with some initial binding (step 1) 



Initial design 

a) Identify positions where substitution is acceptable and 
choose substitution s to explore, (step 2) 

b) Design an initial small set of variants using experimental 
design methods, (step 3) 



Modify initial 
selection 
parameters based 
on performance 
(Step 09) 



Optionally add new 
substitutions from step 02 for 
inclusion in the new variant set 



Synthesize and test antibody 
variant set for target binding and 
viral neutralizing activity, (step 4) 



End-point 
reached 



Propose a new variant set 
based on the model, 
(step 7) 



Iterate 



Derive sequence -activity relationships (step 5) 

Combine results from different sequence-function 
models, (step 6) 



Select the best variant(s) (step 8). 
Use sequences and activities of 
these variants to modify algorithms 
used for substitution selection (step 
9) and sequence-function modeling, 
(step 10) 



Modify methods for combining 

different sequence-function 
models based on performance 
(Step 10) 



Figure 3 
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Identify all sequences homologous to starting enzyme and align (eg using 
ClustalW) 



A: Substitutions from homologous sequences 

• Reconstaict phylogenetic tree 
RULE 1a: 

^ Calculate conservation index for each position 

• Select substitution sites with lowest conservation indices 
RULE 2a: 

• Calculate tolerated heterogeneity for each position 

• Select most heterogeneous substitution sites 
RULE 3a: 

• Calculate relative rates of synonymous and non-synonymous substitution 

• Select sites with highest ratios 
SCORE: 

• Weighted value for each rule satisfied 



B: Substitutions from homologous structures 
RULE 1b: 

• Calculate AG change for all single substitutions 

• Select changes with <a defined value Kcal/mol change in free energy 
RULE 2b: 

• Superpose homologous structures from PDB 

• Estimate mean RMSD for every window of a defined number of residues 

• Select sites with a RMSD above a defined value 
RULE 3b: 

• Identify changes found in homologous sequences 

• Select varying sites within a defined distance of catalytic and binding sites 
SCORE: 

• Weighted value for each rule satisfied 



C: Substitutions from substitution matrices 
RULE 1c: 

• Calculate substitution matrix for specific biopolymer family, rank all possible 
single substitutions for favorability 

• Select highest scoring positions 
RULE 2c: 

• Rank all possible single substitutions for favorability using a universal 
substitution matrix 

• Select highest scoring positions 
SCORE: 

• Weighted value for each rule satisfied 



D; Substitutions from PCA analysis 

RULE 1d: 

• Determine principal components of sequence variation in alignment of homologs 

• Select highest scoring positions 
SCORE: 

• Weighted value for each rule satisfied 



Figure 4 



wo 2005/013090 PCT/US2004/024752 
— H Select initial candidate antibody sequence(s) | 



A: Substitutions from homologous sequences In framework & CDR 

• Identify framework sequences within a defined evolutionary distance (PAM units) 

• Reconstruct phylogenetic tree for framework region only 
RULE 1a: 

•Select a defined number of framework residues that have undergone advantageous 

change 
RULE 2a: 

• Select defined number of framework and defined number of CDR positions with 
highest mutability index 

RULE 3a: 

•Select ail amino acid substitutions from sequences in the same Chothia class 
SCORE:: Weighted value for each mle satisfied 



B: Substitutions from homologous structures 
•Superpose homologous structures from PDB 
RULE 1b: 

• Estimate mean RMSD for every window of a defined number of residues 

• Select framework sites with an RMSD greater than a defined value 
RULE 2b: 

Identify changes found in homologous sequences 
Select framework varying sites within a defined distance from CDR 
SCORE:: Weighted value for each rule satisfied 



C; Substitutions from substitution matrices 

•Calculate substitution matrix for all framework regions and canonical classes, rank 
all possible single substitutions for lavorability 
RULE 1c: 

• Select highest scoring substitutions for each matrix 
RULE 2c: 

• Rank all possible single substitutions for favorability using a universal substitution 
matrix 

Select highest scoring positions 
SCORE: Weighted value for each rule satisfied 



D; Substitutions from PCA analysis 

• Determine principal components of sequence variation in alignment of homologs 
RULE 1d: 

•Group CDRs based on PCA of amino acid sequences in the CDR. 

•Select highest scoring CDR positions that differentiate antibody sequences by function 

RULE 2d: 

•Group CDR positions based on observed amino acid frequencies at each site 

• Rank the groups based on contributions to one or more principal components 
Select the top groups of sites to vary. 

SCORE: Weighted value for each rule satisfied 



E: Substitutions from Binding pocket Analysis 
RULE 1e: 

• Select sites based where physico-chemical properties of residues are conserved 
in the pocket 
RULE 2e: 

•Select CDR changes derived from evolutionary models to correlate properties of 
amino acids. 

SCORE: Weighted value for each rule satisfied 



Figure 5 
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A: Calculate average weight vectors: 

a) Build sequence-function model from data 
in wliich rows (sequence + function) and/or 
columns (substitutions) are randomly 
omitted. Calculate weight vectors. 



b) Repeat 1,000 times. 



c) Calculate average values and standard 
deviations for weight vectors and rank In 
order of Importance. 



Figure 6 



A: Calculate average weight vectors: 

a) Build sequence-function model from data 
in which rows (sequence + function) and/or 
columns (substitutions) are randomly 
omitted. Calculate weight vectors. 

b) Repeat 1,000 times. 

c) Calculate average values and standard 
deviations for weight vectors and rank in 
order of importance. 



B: Calculate weight vectors from 
randomized data: 

a) Randomly associate sequence data with 
function data 

b) Build sequence-function model and 
calculate weight vectors. 

c) Repeat 1 ,000 times. 

d) Calculate average value and standard 
deviations for weight vectors from 
randomized data. 



C: Calculate number of standard deviations weight 
vector is above value from randomized data. 



Figure 7 
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E coli leader peptide 



-20 


-10 -1 








MKKLLFAIPL 


WPFYSHSTM 


(SEQ ID NO. : 1) 




Proteinase K 










1 


11 


21 


31 


41 


APAVEQRSEA 


APLIEARGEM 


VANKYIVKFK 


EGSALSALDA 


AMEKISGKPD 


51 


61 


71 


81 


91 


HVYKNVFSGF 


AATLDENMVR 


VLRAHPDVEY 


lEQDAWTIN 


AAQTNAPWGL 


101 


111 


121 


131 


141 


ARISSTSPGT 


STYYYDESAG 


QGSCVYVIDT 


GIEASHPEFE 


GRAQMVKTYY 


151 


161 


171 


181 


191 


YSSRD6NGHG 


THCAGTVGSR 


TYGVAKKTQL 


FGVKVLDDNG 


SGQYSTIIAG 


201 


211 


221 


231 


241 


MDFVASDKNN 


RNCPKGWAS 


LSLGGGYSSS 


VNSAAARLQS 


SGVMVAVAAG 


251 


261 


271 


281 


291 


NNNADARNYS 


PASEPSVCTV 


GASDRYDRRS 


SFSNYGSVLD 


IPGPGTSILS 


301 


311 


321 


331 


341 


TWIGGSTRSI 


SGTSMATPHV 


AGLAAYLMTL 


GKTTAASACR 


YIADTANKGD 


351 


361 


371 






LSNIPFGTVN 


liLAYNNYQAV 


DHHHHHH (SEQ ID NO.: 2) 
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-60 -50 -40 -30 -20 -10 -1 

atgaaaaaac tgctgttcgc gattccgctg gtggtgccgt tctatagcca tagcaccatg 

1 11 21 31 41 51 

GCACCGGCCG TTGAACAGCG TTCTGAAGCA GCTCCTCTGA TTGAGGCACG TGGTGAAATG 

61 71 81 91 101 111 

GTAGCAAACA AGTACATCGT GAAGTTCAAG GAGGGTTCTG CTCTGTCTGC TCTGGATGCT 

121 131 141 151 161 171 

GCTATGGAAA AGATCTCTGG CAAGCCTGAT CACGTCTATA AGAACGTGTT CAGCGGTTTC 

181 191 201 211 221 231 

GCAGCAACTC TGGACGAGAA CATGGTCCGT GTACTGCGTG CTCATCCAGA CGTTGAATAC 

241 251 261 271 281 291 

ATCGAACAGG ACGCTGTGGT TACTATCAAC GCGGCACAGA CTAACGCACC TTGGGGTCTG 

301 311 321 331 341 351 

GCACGTATTT CTTCTACTTC CCCGGGTACG TCTACTTACT ACTACGACGA GTCTGCCGGT 

361 371 381 391 401 411 

CAAGGTTCTT GCGTTTACGT GATCGATACG GGCATCGAGG CTTCTCATCC TGAGTTTGAA 

421 431 441 451 461 471 

GGCCGTGCAC AAATGGTGAA GACCTACTAC TACTCTTCCC GTGACGGTAA TGGTCACGGT 

481 491 501 511 521 531 

ACTCATTGCG CAGGTACTGT TGGTAGCCGT ACCTACGGTG TTGCTAAGAA AACGCAACTG 

541 551 561 571 581 591 

TTCGGCGTTA AAGTGCTGGA CGACAACGGT TCTGGTCAGT ACTCCACCAT TATCGCGGGT 

601 611 621 631 641 651 

ATGGATTTCG TAGCGAGCGA TAAAAACT^C CGCAACTGCC CGAAAGGTGT TGTGGCTTCT 

661 671 681 691 701 711 

CTGTCTCTGG GTGGTGGTTA CTCCTCTTCT GTTAACAGCG CAGCTGCACG TCTGCAATCT 

721 731 741 751 761 771 

TCCGGTGTCA TGGTCGCAGT AGCAGCTGGT AACAATAACG CTGATGCACG CAACTACTCT 

781 791 801 811 821 831 

CCTGCTAGCG AGCCTTCTGT TTGCACCGTG GGTGCATCTG ATCGTTATGA TCGTCGTAGC 

841 851 861 871 881 891 

TCCTTCAGCA ACTATGGTTC CGTCCTGGAT ATCTTCGGCC CTGGTACTTC TATCCTGTCT 



Figure 9A 
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901 911 921 931 941 951 

ACCTGGATTG GCGGTAGCAC TCGTTCCATT TCCGGTACGA GCATGGCTAC TCCACATGTT 

961 971 981 991 1001 1011 

GCTGGTCTGG CAGCATACCT GATGACCCTG GGTAAGACCA CTGCTGCATC CGCTTGTCGT 

1021 1031 1041 1051 1061 1071 

TACATCGCGG ATACTGCGAA CAAAGGCGAT CTGTCTAACA TCCCGTTCGG CACCGTTAAT 

1081 1091 1101 1111 1121 1131 

CTGCTGGCAT ACAACAACTA TCAGGCTgtc gaccatcatc atcatcatca tag 

(SEQ ID NO.: 3) 



Figure 9B 
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gi 1 19171215 I emb | CAD20578 . 1 1 /89 
gi 1 19171217 I emb | CAD20579 . 1 1 /l- 
gi 1 19171219 I emb I CAD20580 . 1 1 /l- 
gi 1 19171221 1 emb | CAD205B1 . 1 1 /l- 
gi I 16215662 I emb I CAC95042 . 1 1 /90 
gi 1 16506136 I dbj | BAB70705 . 1 1 /78 
gi 1 16506134 | dbj | BAB70704 .11/78 
gi|l6506140|dbj | BAB70707 . 1 | /78 
gi 1 16215677 ] emb | CAC95049 . 1 1 /90 
gi 1 117631 1 sp I P2913 8 I CUDP_METAN 
gi I 6624958 | emb | CAB63911 . 1 1 /90- 
gi 1 16215669 | emb | CAC95045 . 1 1 /90 
gi I 460032 I gb I AAA91584.il/84-36 
gi 1 6634475 |emb|CAB64346 .1 1 /87- 
gi 1 16215664 | emb | CAC95043 . 1 1 /87 
gi|23513 88|gb|AAC49831.l|/86-3 
gil 8671180 I emb I CAB95012.1 1/85- 
gi 1 16215666 1 emb | CAC95044 . 1 1 /B5 
gi|l621567l|emb|CAC95046.l|/85 
gi I 4092486 |gb|AAC99421.1 1/64-2 
gi 1 1854242 9 I gb lAAIi75579 . 1 1AF46 
SUTIKA/91-367 

gi 1 131077 | sp| P06873 j PRTK_TRIAL 
gi I 23 067 5 |pdb|2PRK| /I -277 
gi I 494434 jpdb | IPEKj E/1-277 
gi|224977|prf I | 1205229A/1-27S 
gi|l427865elpdb| lIC6|A/l-277 
gi|l31084 |sp|P23 653 |PRTR_TRIAIi 
gi|4761119 |gb|AAD29255.l|AF104 
gi 1 14626933 j gb 1 AAK70804 . 1 1 /81- 
gi I 639712 |gb| AAC48979 . 1 1 /83-34 
gi I 742825 |prf I | 2011184A/84-362 
gi|6280Sl|pir| 1 JC2142/84-362 
gi 1 15808791 1 gb I AAIi08S02 . 1 1 AF41 
gi 1 15808805 I gb I AAIi08509 . 1 1 AF41 
gi j 28918475 Igb I EAA28148. 11/90- 
gi| 10181226|gbl AAC27316.2 I /92- 
gi| 13108B|sp|P20015 1PRTT_TRIAI* 
gi| 9971109 I emb I CAC07219.1 1/86- 
gi I 7543 916 |emb|CAB87194.1 1 /89- 
gij 5813 790]gb|AAD52013.l|AF082 
gi I 23894244 | emb | CAD23614. 1 1 /II 
gil 2265214l|gb|AAN03634.l|AF40 
gi|24528136|emb|CAD24010.l|/10 
gi 1 2452813 2 1 emb I CAD24 008 . 1 I /lO 
A35742./126-403 

gi 1 114081 1 sp| P08594 j AQL1_THEAQ 

AAA82980./129-408 

gi 1 15640187 | ref | NP_229814 - 1 1 /I 

AAA22247./107-381 
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Residue PC1 contrib. 
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VariationScore 



Primary contribution to score 



N95C 

P97S 

S107D 

SI 23 A 

E138A 

M145F 

Y151A 

VI 671 

LI 801 

Y194S 

A199S 

K208H 

A236V 

R237N 

P265S 

V267I 

S273T 

G293A 

L299C 

I310K 

K332R 

S337N 

P355S 



5 Structural stability at higher temperature: from published literature 

3 P to S for flexibility and structural perturbabtion 

5 from active homologs 

7 Thermostable consensus 

5 From experiments in literature 

5 From experiments to improve thermostability 

8 From experiments to improve thermostability 

10 Allow user specified conservative changes (controlled perturbation) 

10 Allow user specified conservative changes (controlled perturbation) 

1 0 Varaiation observed in highly active clone from our initial exp. 

8 Allow user specified conservative changes (controlled perturbation) 

7 PCA modelling of homologs collected from GenBank. 

7 PCA modelling of homologs collected from GenBank. 

5 From experiments to improve thermostability (in literature) 

3 P to S for flexibility and structural perturbabtion 

10 Allow user specified conservative changes (controlled perturbation) 

15 Multiple sources identify this change, (thermostability and other) 

8 For thermostability considerations (observed in thermitases) 
5 For disulphide bridges with N95C ( from literature) 

5 from structural studies 

8 for thermostability considerations (observed in thermitases) 

8 for thermostability considerations (observed in thermitases) 

3 P to S for flexibility and structural perturbabtion 
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variant-1: 123, 151, 293, 310, 332, 355 
variant-2: 95, 145, 167, 199, 237, 273 
variant-3: 97, 138, 180, 194, 236, 267 
variant-4: 107, 132, 208, 265, 299, 337 
variant-5: 123, 145, 151, 167,273,337 
variant-6: 97, 107, 180, 236, 237, 310 
variant-7: 123, 138, 199, 208, 265, 355 
variant-8: 95, 194, 267, 293, 299, 332 
variant-9: 95, 132, 138, 145, 167, 208 
variant-10: 236, 237, 273, 293, 332, 355 
variant-1 1: 97, 123, 265, 299, 310, 337 
variant-12: 107, 151, 180, 194, 199, 267 
variant-13: 95, 107, 123, 180, 194, 337 
variant-14: 138, 151, 167, 199,208,299 
variant-15: 97, 145, 237, 273, 293, 310 
variant-1 6: 132, 236, 265, 267, 332, 355 
variant-17: 97, 151, 199, 236, 299, 355 
variant-1 8: 95, 107, 167, 180, 293, 310 
variant-19: 145, 237, 265, 267, 332, 337 
variant-20: 123, 132, 138, 194, 208, 273 
variant-21: 123, 208, 236, 267, 293, 299 
variant-22: 107, 132, 138, 145, 337, 355 
variant-23: 97, 180, 194, 199, 265, 310 
variant-24: 95, 151, 167, 237, 273, 332 

Figure 16 
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Variant # 


Changes 


variant-25 


95 


variant-26 


97 


variant-27 


138 


variant-28 


208 


variant-29 


236 


variant-30 


237 


variant-31 


265 


variant-32 


299 


variant-33 


107, 123, 145 


variant-34 


151, 167, 180 


variant-35 


194, 199, 267 


variant-36 


273, 293,310 


variant-37 


332, 337, 355 


variant-38 


107,151,194, 273, 332 


variant-39 


123, 167, 199, 293, 337 


variant-40 


145, 180, 267,310,355 


variant-41 


107, 167, 267, 273,337 


variant-42 


123, 180, 194, 293,355 


variant-43 


145, 151, 199,310, 332 


variant-44 


145, 167, 194 


variant-45 


180, 199, 273 


variant-46 


267, 293, 332 


variant-47 


310, 337, 107 


variant-48 


355, 123, 151 



Reasons 

Confirm detrimental effect on enzyme 
Confirm detrimental effect on enzyme 
Confirm detrimental efifect on enzyme 
Confirm detrimental effect on enzyme 
Confirm detrimental effect on enzyme 
Confirm detrimental effect on enzyme 
Confirm detrimental effect on enzyme 
Confirm detrimental effect on enzyme 
New combinations of positive changes 
New combinations of positive changes 
New combinations of positive changes 
New combinations of positive changes 
New combinations of positive changes 
New combinations of positive changes 
New combinations of positive changes 
New combinations of positive changes 
New combinations of positive changes 
New combinations of positive changes 
New combinations of positive changes 
New combinations of positive changes 
New combinations of positive changes 
New combinations of positive changes 
New combinations of positive changes 
New combinations of positive changes 
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Sequence changes 
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Sequence chances 
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Position 
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Identify all sequences homologous to MX PEP and align using ClustalW 



A: Substitutions set 
RULE 1a: 

• Identify all substitutions seen in 35 homologs from Genbamk 

• Consider only these susbtitutions (le RULE 1a is a filter) 



B: Substitutions from homologous sequences 

• Reconstruct phylogenetic tree 
RULE lb: 

• Calculate evolutionary proximity of the closest homolog in which each 
substitution occurs (EP) 

RULE 2b: 

• Calculate site heterogeneity at each substitution position (SH) 
RULE 3b: 

• Calculate entropy at each substitution position (SE) 
RULE 4b: 

• Calculate number of times a substitution is seen at a position in the set of 

homologs (SN) 





C: Substitutions from substitution matrices 
RULE 1c: 

• Calculate favorability of each substitution using a PAM250 matrix (SM). 




r 




D; Score 

Score = f(EP) x f(SH) x f(SE) x f(SN) x f(SM) 



Figure 26 
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Align heavy chain sequences from Genbank accession # AAF21612 with human 
gemillne immunoglobulin heavy chain sequences from VBase using ClustalW. 





A: Substitutions set 
RULE 1a: 

• Enumerate and classify the substitutions into 2 categories. 

• (i) Substitutions found in the framework region (FW) and 

• (ii) substitutions found in the complementarity determining region (CDR). 

• Consider only these susbtltutions (ie RULE la is a filter), and consider them 

separately 






B: Substitutions from human germllne sequences 

• Reconstruct phylogenetic tree 
RULE 1b: 

• Calculate evolutionary proximity of the closest homolog in which each 
substitution occurs (EP) 

RULE 2b: 

• Calculate site heterogeneity at each substitution position (SH) 
RULE 3b: 

• Calculate entropy at each substitution position (SE) 
RULE 4b: 

• Calculate number of times a substitution Is seen at a position in the set of 
homologs (SN) 






C: Substitutions from substitution matrices 
RULE 1c: 

• Calculate favorability of each substitution using a PAM100 matrix (SM). 



L_ 

D: Scora 

ScoKjnf, = f(EP) X f(SH) X f(SE) X fCSN) x KSM) 
ScQKcDR = f (SE) X f (SN) x f (SlVf) 



Figure 27 
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Align RSV-19 heavy chain sequence with human germline ig heavy chain 
sequences from VBase using ClustalW. 





A: Substitutions set 
RULE 1a: 

• enumerate ana ciassiiy uie suosuiuuons inio ^ caiegones. 

• (i) Substitutions found in the framework region (FW) and 

• (ii) substitutions found in the complementarity determining region (CDR). 

• Consider only these susbtitutions (ie RULE 1a is a filter), and consider them 
separately 


B: Substitutions from human germline sequences 

• Reconstruct phylogenetic tree 
RULE 1b: 

• Calculate evolutionary proximity of the closest homolog in which each 
substitution occurs (EP) 

RULE 2b: 

• Calculate site heterogeneity at each substitution position (SH) 
RULE 3b: 

• Calculate entropy at each substitution position (SE) 
RULE 4b: 

• Calculate number of times a substitution is seen at a position in the set of 
homologs (SN) 






C: Substitutions from substitution matrices 
RULE 1c: 

* Calculate favorability of each substitution using a PAM100 matrix (SM). 






D; Score 

Scorepw = f(EP) X f(SH) x f(SE) x f(SN) x fl[SM) 
ScorCcDR = f (SE) X f (SN) x f (SM) 



Figure 28 



